An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes

نویسندگان

  • Chih-Hao Chen
  • Hsing-Chung Lee
  • Qingdong Ling
  • Hsiao-Rong Chen
  • Yi-An Ko
  • Tsong-Shan Tsou
  • Sun-Chong Wang
  • Li-Ching Wu
  • H. C. Lee
چکیده

Detection of copy number variation (CNV) in DNA has recently become an important method for understanding the pathogenesis of cancer. While existing algorithms for extracting CNV from microarray data have worked reasonably well, the trend towards ever larger sample sizes and higher resolution microarrays has vastly increased the challenges they face. Here, we present Segmentation analysis of DNA (SAD), a clustering algorithm constructed with a strategy in which all operational decisions are based on simple and rigorous applications of statistical principles, measurement theory and precise mathematical relations. Compared with existing packages, SAD is simpler in formulation, more user friendly, much faster and less thirsty for memory, offers higher accuracy and supplies quantitative statistics for its predictions. Unique among such algorithms, SAD's running time scales linearly with array size; on a typical modern notebook, it completes high-quality CNV analyses for a 250 thousand-probe array in ∼1 s and a 1.8 million-probe array in ∼8 s.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BIRC5 Genomic Copy Number Variation in Early-Onset Breast Cancer

Background: Baculoviral inhibitor of apoptosis repeat-containing 5 (BIRC5) gene is an inhibitor of apoptosis that expresses in human embryonic tissues but it is absent in most healthy adult tissues. The copy number of BIRC5 has been indicated to be highly increased in tumor tissues; however, its association with the age of onset in breast cancer is not well understood. Methods: Forty tumor tiss...

متن کامل

Analytical evaluation of an innovative decision-making algorithm for VM live migration

In order to achieve the virtual machines live migration, the two "pre-copy" and "post-copy" strategies are presented. Each of these strategies, depending on the operating conditions of the machine, may perform better than the other. In this article, a new algorithm is presented that automatically decides how the virtual machine live migration takes place. In this approach, the virtual machine m...

متن کامل

Comparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species

Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...

متن کامل

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

A Pairwise-Gaussian-Merging Approach: Towards Genome Segmentation for Copy Number Analysis

Segmentation, filtering out of measurement errors and identification of breakpoints are integral parts of any analysis of microarray data for the detection of copy number variation (CNV). Existing algorithms designed for these tasks have had some successes in the past, but they tend to be O(N) in either computation time or memory requirement, or both, and the rapid advance of microarray resolut...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 39  شماره 

صفحات  -

تاریخ انتشار 2011